Scalable distributed speech recognition using Gaussian mixture model-based block quantisation

نویسندگان

  • Stephen So
  • Kuldip K. Paliwal
چکیده

In this paper, we investigate the use of block quantisers based on Gaussian mixture models (GMMs) for the coding of Mel frequency-warped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. Specifically, we consider the multi-frame scheme, where temporal correlation across MFCC frames is exploited by the Karhunen–Loève transform of the block quantiser. Compared with vector quantisers, the GMM-based block quantiser has relatively low computational and memory requirements which are independent of bitrate. More importantly, it is bitrate scalable, which means that the bitrate can be adjusted without the need for re-training. Static parameters such as the GMM and transform matrices are stored at the encoder and decoder and bit allocations are calculated on-the-fly without intensive processing. We have evaluated the quantisation scheme on the Aurora-2 database in a DSR framework. We show that jointly quantising more frames and using more mixture components in the GMM leads to higher recognition performance. The multi-frame GMM-based block quantiser achieves a word error rate (WER) of 2.5% at 800 bps, which is less than 1% degradation from the baseline (unquantised) word recognition accuracy, and graceful degradation down to a WER of 7% at 300 bps. 2005 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable distributed speech recognition using multi-frame GMM-based block quantization

In this paper, we propose the use of the multi-frame Gaussian mixture model-based block quantizer for the coding of Mel frequencywarped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. This coding scheme exploits intraframe correlation via the Karhunen-Loéve transform (KLT) and interframe correlation via the joint processing of adjacent frames together ...

متن کامل

Recognizing the Emotional State Changes in Human Utterance by a Learning Statistical Method based on Gaussian Mixture Model

Speech is one of the most opulent and instant methods to express emotional characteristics of human beings, which conveys the cognitive and semantic concepts among humans. In this study, a statistical-based method for emotional recognition of speech signals is proposed, and a learning approach is introduced, which is based on the statistical model to classify internal feelings of the utterance....

متن کامل

Speech Enhancement using Laplacian Mixture Model under Signal Presence Uncertainty

In this paper an estimator for speech enhancement based on Laplacian Mixture Model has been proposed. The proposed method, estimates the complex DFT coefficients of clean speech from noisy speech using the MMSE  estimator, when the clean speech DFT coefficients are supposed mixture of Laplacians and the DFT coefficients of  noise are assumed zero-mean Gaussian distribution. Furthermore, the MMS...

متن کامل

Multi-frame GMM-based block quantisation of line spectral frequencies

In this paper, we investigate the use of the Gaussian mixture model-based block quantiser for coding line spectral frequencies that uses multiple frames and mean squared error as the quantiser selection criterion. As a viable alternative to vector quantisers, the GMM-based block quantiser encompasses both low computational and memory requirements as well as bitrate scalability. Jointly quantisi...

متن کامل

Subspace Gaussian Mixture Models for Large Vocabulary Speech Recognition

Subspace Gaussian mixture model(GMM) is an alternative approach to approximate the probabilistic density function (p.d.f) of a set of independent identical distributed (i.i.d) data with prior density estimates. In this approach, the prior density of GMM parameters is estimated from a development dataset, and when predict the new enrolled data, the prior knowledge can be utilised by criteria lik...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Speech Communication

دوره 48  شماره 

صفحات  -

تاریخ انتشار 2006